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Abstract 



Long-term preservation of data and software of large experiments and detectors in high energy physics is of utmost 
i-H importance to secure the heritage of (mostly unique) data and to allow advanced physics (re-)analyses at later times. 
Summarising the work of an international study group, motivation, use cases and technical details are given for an 
organised effort to secure and enable future use of past, present and future experimental data. As a practical use case 
Q-iand motivation, the revival of JADE data and the corresponding latest results on measuring a s in NNLO QCD are 
^ reviewed. 
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1. Introduction 

X ; 

<1} Analyses of data from large scale projects in experimental 
Q j particle physics are usually pursued for a typical time pe- 
D riod of 5 years after close-down of the experiments. After 
i this time of post-mortem analyses, the number of active 
members of large collaborations deteriorates to zero, as 
does the active maintenance of data and software which is 
needed to efficiently analyse these data. While the data, 
as e.g. those obtained from 11 years of running of the 
electron-positron collider LEP, or from 16 years of run- 
ning of the lepton-hadron collider HERA, remain to be of 
unique importance and relevance for the field of high en- 
ergy particle physics, the long-term storage of these data 
and - especially - the possibility to analyse these data us- 
ing the mandatory software packages and know-how of de- 
tector particularities is, in almost all cases, not warranted 
after a relatively short period of time. In fact, at this time, 
the data of many past collider projects are already lost, or 
are in an unusable state, and as this contribution is being 
written up, data e.g. from LEP experiments continue to 
become lost forever. 

An international study group for data protection and fu- 
ture use of high energy physics data, DPHEP, has formed 
and has presented [l| its first assessments of possible use 
cases and the technicalities of data and software preser- 
vation. In the following, the physics case for data preser- 
vation and re-analysis will be given, and will be demon- 
strated by recent physics results obtained from data of 
the JADE experiment which operated from 1979 to 1986 
at the e+e- collider PETRA at DESY. Some details of 
preservation models will also be summarised. 
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2. Physics case 

The most important scientific reasons for long-term data 
protection and future use of data from past experiments 
are the following: 

• long term completion and extension of the scientific 
program of the project: 

The original program of a large scientific project is 
usually not completed at times of shutdown of the ex- 
periment (s), and is not completely finalised even after 
a period of a few additional years of data analyses, 
when availability and usability of data and software 
deteriorate due to the fast development of storage and 
computer hard- and software systems, and due to the 
fading availability of expert knowledge and personnel. 

• cross collaboration analyses: 

maximum value and sensitivity of the data of large 
collider projects can be achieved by combining the 
data statistics of several experiments which operated 
on these facilities. Such combinations, however, are 
often not completed, or not even started at the time 
when projects end. 

• data re-use: 

due to the general development of scientific knowl- 
edge, new questions may arise and/or new theoret- 
ical models and experimental methods may become 
available which make re-analyses of old data manda- 
tory, if no such data are available from newer or active 
projects. 

• training, education and outreach: 

there are many examples for successful use of data 
and analysis tools from past experiments to train and 
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educate students and even pupils on modern scientific 
questions and methods; data, results and simulations 
of past experiments are often used for outreach pur- 
poses since the "owner's rights" on public access to 
such data are usually less restrictive after close-down 
of the project. 

3. The JADE experiment at PETRA: past and 
presence 

One of the few practical examples of successful usage of 
data from a large experiment, up to 30 years after the data 
have actually been taken, is the preservation and reanalysis 
of data from the JADE experiment which operated at 
the PETRA e + e- collider 3 at the DESY laboratory at 
Hamburg. 

JADE was one of the first symmetric and maximally 
hermetic multi-purpose, electronic particle detectors, com- 
prising a high resolution gas tracking (jet) chamber, placed 
in a hermetic solenoidal magnetic field of 0.5 Tesla, sur- 
rounded by an electromagnetic calorimeter and a hermetic 
muon filter and muon detector system. PETRA deliv- 
ered electron-positron collisions at centre of mass energies 
from 12 to 46 GeV. In total, about 200 pb' 1 of high qual- 
ity collision data was taken by JADE during its lifetime, 
corresponding to about 45.000 well reconstructed multi- 
hadronic final state events 0]. 

JADE, together with 4 other experiments at PETRA, 
took data from 1979 to 1986, when the PETRA collider 
was shut down and construction work for the HERA col- 
lider began. The data and software files continued to be 
actively used for few more years, until 1990/1991, when the 
last analysis results were published. After that time, the 
data, residing on archive tapes, where physically removed 
from the DESY computer centre and stored in aluminium 
boxes. Space limitations at DESY imposed physical de- 
struction of these tapes by 1997. 

The source code of the JADE software framework was 
collected and stored on private computer accounts which 
were maintained until the IBM main frame computer was 
phased out at DESY in 1997. The JADE collaboration 
had no plan nor model for further data preservation and 
future use of their data. 

The post-mortem project of JADE data and software 
revival is due to the interest and initiative of a few indi- 
vidual previous JADE members, which started in 1995 to 
1997, just in time to prevent inevitable loss of data and 
software. 

In 1997, about 1 TB of raw and calibrated data and 
MC production were moved from a few thousand archived 
round tapes (160 MB per tape) to 600 IBM 3490 tapes 
(800 MB per tape). A second copy of the data was made 
on 200 Exabyte cartridges (2.5 GB each) Q]. No MC gen- 
erated data files were preserved at that stage. In 2005, the 
exabyte cartridges travelled to Munich, were transferred 
to disk, and are now a (very small) part of the ATLAS 



data storage at the LHC Tier-2 centre of the Max Planck 
Society computing centre at Garching. 

The reactivation of the software was successfully com- 
pleted in 1999 It required adapting the JADE soft- 
ware code, originally consisting of FORTRAN- IV, but also 
partly of SHELTRAN, MORTRAN and Assembler code, 
to UNIX platforms and modern FORTRAN compilers. 

Today, the generation of model collision events, using 
modern physics Monte Carlo Generators with full detec- 
tor simulation, is possible again, and simulated as well as 
real data events can be examined using a revived version 
of the original JADE event display with enhanced options 
like colour (which was not available during JADE running 
time). The revived JADE software runs on IBM AIX ma- 
chines, relying on the fact that these systems utilise the 
same byte order as the IBM 370 did. The revitalisation, 
details of emulation routines and the usage of the software 
packets and data files is documented in a respective JADE 
computing note @. 

The complexity of the software code and the data struc- 
tures, of the detector hardware and its simulation, how- 
ever, still requires the knowledge of experts for analysing 
these data. This knowledge is currently maintained at the 
Max-Planck- Institute for Physics at Munich and at DESY. 
The data and usage thereof is still "owned" by the original 
JADE collaboration, such that no general "open access" 
to these data is granted. 

4. Physics benefits: new results from old data 

Improvements motivating reanalyses of the JADE data, 
due to advanced theoretical knowledge and analysis meth- 
ods compared to those being available at PETRA times, 
are summarised, with special attention to the study of 
hadronic final states, in Table 1. Enhanced and more 
profound theoretical knowledge, more sophisticated Monte 
Carlo (MC) and hadronisation models, improved and opti- 
mised experimental observables and methods, and a much 
deeper understanding and precise knowledge of the Stan- 
dard model of electroweak and strong interactions make 
it mandatory and beneficial to reanalyse old data and to 
significantly improve their scientific impact. 
In general, these benefits can be used to 

• re-do previous measurements, with increased preci- 
sion and reduced systematic uncertainties; 

• perform new measurements, at Energies and processes 
where no other data are available today; 

• if new phenomena are found today: go back and check 
at lower energies. 

4-1- JADE data and LEP parametrisation: universality of 
hadronisation 

One of the first surprises when starting to reanalyse JADE 
data was to realise that newly generated Monte Carlo 
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Table 1: Possible improvements motivating reanalyzes of old data, with past and presently available knowledge and methods. 



improvement 


then (at/after PETRA) 


now (after LEP) 


new and improved theoretical calculations 


QCD in (N)LO 


QCD in resummed NNLO 


new and improved MC models 


fixed order (N)LO 


NLLA & NLO shower 


new and optimised observables 


event shapes: T, S, O,... 


B w , Bt, -D3, Durham jets, ... 


more complete knowledge of Standard Model 




top quark, W, Z, ... 



model events, based on modern QCD shower models like 
JETSET, HERWIG and PYTHIA, using parameters as 
optimised much later by the experiments at the LEP col- 
lider, described the JADE data, at much smaller cm. en- 
ergies than at LEP, to a degree never obtained at PETRA 
times @, 0] ■ In detail, hadronic event shape distributions 
are correctly described at all energies, down to 14 GeV, by 
the models without the need to re-adjust model parame- 
ters at each cm. energy (see Fig.QJ - a fact never achieved 
at PETRA times, where models required significant retun- 
ing of parameters at each major energy. 

There is an important physics result behind this obser- 
vation: the process of hadronisation as implemented in 
these models does not depend on the cm. energy, such 
that studies of physical parameters, like the size and the 
energy dependence of the strong coupling constant, a s , can 
be pursued with a minimum of systematic uncertainties. 
Moreover, using the JADE data sample, such measure- 
ments can be done in the entire PETRA energy range, 
where the energy dependence of a s is expected to be much 
larger than at the higher energies of LEP. Note that at PE- 
TRA times, the insufficient quality of modelling the lowest 
energy data, around 14 and 22 GeV, prevented significant 
studies of those data. 

4-2. The running coupling a s in NNLO QCD 

The latest study and re- use of JADE data [&] demonstrates 
the physical value of old data at times far after the active 
time of the experiment: measurements of the coupling pa- 
rameter of the strong interaction, a s , can now be pursued 
with much higher precision and considerably smaller sys- 
tematic uncertainties than at the times of PETRA. All of 
the facts listed in Table 1 apply and improve the signifi- 
cance of such measurements today, using the data of the 
past. 

The status of a s measurements at the time of PETRA 
was reviewed e.g. in It can be summarised by quot- 

ing a s (35 GeV) = 0.14 ±0.02, where the error was largely 
dominated by hadronisation and QCD uncertainties. 

The results of reanalysing the JADE data Q, using 
modern event shape and jet distributions and the most 
recent and advanced predictions in resummed next-next- 
to-leading order (NNLO) of QCD perturbation theory 11], 
are shown in Figure [21 as a function of the cm. en- 
ergy. Also shown is the prediction of the running a s in 
3-loop QCD perturbation theory, for the central fit value 
of a s (M z o) to all JADE data, 



a s (M z o) = 0.1172 ± 0.0020(exp.) ± 0.0046(th.) . 

The results are also compared to a similar analysis using 
LEP data 

The value of reusing JADE data is obvious: a s runs 
with energy as predicted by QCD, which is significantly 
proven by the JADE data alone, manifesting the concept 
of Asymptotic Freedom [l| (see [l4[ for a recent review of 
measurements of a s ). No other such results are currently 
available in this energy regime. They are, due to many 
improvements in the field during the past 20 years, signif- 
icantly more precise than what has been achieved during 
and shortly after the actual running of PETRA. 

5. International effort of data preservation: 
DPHEP 

While the JADE example is one of the only existing ex- 
amples of preserving and reusing data and software of a 
complex experiment in high energy physics, it is known 
that the data of many other experiments are already lost 
inevitably, and/or cannot be used any more due to the lack 
of functional software and analysis environment. The LEP 
experiments, more than 10 years after active data taking, 
report occasional analyses until today, however it is known 
that the data, as well as the corresponding software envi- 
ronments, are beginning to fade away, and some losses of 
data (archive tapes) were already communicated. 

In 2009, an international initiative to preserve data in 
high energy physics (DPHEP) has formed and worked out 
arguments, technical details and governance policy for a 
concerted effort to preserve and re-use data of recent and 
current large-scale high energy physics experiments. In 
particular, 4 different levels of data preservation models 
have been defined, as summarised in Table [3] These levels 
are inclusive, i.e. higher levels include the details of those 
before. In general, they differ in their overall purpose, 
in their degree of flexibility and in the amounts of efforts 
necessary to maintain these levels. 

While levels 1 and 2 are realised in a number of projects 
and cases, they do not allow to perform new or improved 
analyses (compared to what was published in the past). 
Level 3 provides some limited means for new analyses. 
Only level 4, however, gives the flexibility which provides 
full future potential of data analysis. It is also the most 
intricate model, as it requires significant and sustained ef- 
forts of preparation, maintenance and validation. 
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Table 2: Data prcscrvaion models and different use cases as worked out by the DPHEP study group. 



Level 


Preservation model 


use case 


1 


provide additional documentation 


publication-related information search 


2 


preserve the data in simplified format 


outreach, simple training analyses 


3 


preserve the analysis level software and data format 


full scientific use based on existing reconstruction 


4 


preserve reconstruction and simulation software 


full potential of experimental data 




and basic level data 





A typical time-line of a level-4 data preservation model 
is given in Figure [3l It relates the times of data taking, 
of collaboration life time, of data preservation R& D, of 
long term analysis and a possible and final "open access" 
period with the new organisation of physics supervision 
and resources needed to pursue such a project (in units 
of FTE's, as a function of the number of years). Further 
details on the questions of technologies, facilities, funding, 
governance, supervision and authorship rights are elabo- 
rated and given in [![. 

Future usability and preservation of data of large HEP 
experiments is mandatory, both on grounds of scientific 
importance and of sustainment of publicly funded heritage. 
While extra resources must be identified to pursue active 
data preservation, the necessary amount of such resources 
is only at the level of very few percent of the original invest- 
ments. Failing to do so, i.e. accepting the loss of data and 
their scientific usability, may be regarded to be a crime, 
given the large amounts of expenses and manpower which 
were invested in the original experimental programs. 
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Figure 2: Measurements of a s (Q) from JADE data, in the energy range 
from Q — 14 to 44 GeV, using event shapes and QCD predictions in 
resummcd NNLO&NNLA pcrurbation theory. The results from a similar 
analysis of OPAL data (preliminary) arc also included. 
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Figure 1: Distributions of 1-T data at cm. energies from 14 to 44 
GeV. The data are compared to model predictions with hadronisation 
parameters optimised at LEP. 



Figure 3: Timeline and need of resources for a data preservation model 
at level 4, i.e. maintaining full flexibility for future and long-term use of 
the data of a HEP experiment after its termination. 
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